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A BERNSTEIN VON MISES THEOREM IN THE 
NONPARAMETRIC RIGHT-CENSORING MODEL 1 

By Yongdai Kim and Jaeyong Lee 
Seoul National University 

In the recent Bayesian nonparametric literature, many examples 
have been reported in which Bayesian estimators and posterior distri- 
butions do not achieve the optimal convergence rate, indicating that 
the Bernstein-von Mises theorem does not hold. In this article, we 
give a positive result in this direction by showing that the Bernstein- 
von Mises theorem holds in survival models for a large class of prior 
processes neutral to the right. We also show that, for an arbitrarily 
given convergence rate n~ a with 0<a<l/2, a prior process neutral 
to the right can be chosen so that its posterior distribution achieves 
the convergence rate n~ a . 

1. Introduction. The asymptotic properties of posterior distributions 
and Bayes estimators in nonparametric models have been given much at- 
tention in the recent literature. Diaconis and Freedman (1986) opened the 
discussion in this area by showing that in nonparametric models even an 
innocent looking prior can produce an inconsistent posterior. This disturb- 
ing result stirred Bayesians, because it says that a Bayesian can be more 
and more sure of a wrong parameter value as the sample size increases. It 
also initiated research efforts to garner "safe" priors in the asymptotic sense. 
For the research work regarding posterior consistency, see Freedman (1963), 
Schwartz (1965), Barron, Schervish and Wasserman (1999) and Ghosal, 
Ghosh and Ramamoorthi (1999). In the context of survival models, Kim 
and Lee (2001) showed that not all the prior processes neutral to the right 
have consistent posterior distributions and gave sufficient conditions for the 
consistency. 
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Cox (1993) and Zhao (2000) showed that this unfortunate phenomenon 
continues to occur in the posterior convergence rate. For example, Zhao 
(2000) showed that in an infinite dimensional normal model, there is no in- 
dependent normal prior supported on the parameter space that has a Bayes 
estimator that attains the optimal minimax rate. (In the same article, how- 
ever, she constructed a class of priors, mixtures of normal priors supported 
on the parameter space, which achieves the optimal minimax rate.) These 
examples cast doubt on the Bernstein-von Mises theorem in nonparametric 
models even with the prior that has a consistent posterior. 

The Bernstein-von Mises theorem states that the posterior distribution 
centered at the maximum likelihood estimator (MLE) is asymptotically 
equivalent to the sampling distribution of the MLE. Due to the recent ad- 
vent of the Markov chain Monte Carlo method, Bayesians' computational 
ability exceeds that of frequentists. In the situations where frequentists do 
not have a computational tool while Bayesians do, frequentists often use the 
Bayesian credible set as a frequentist confidence interval. The theoretical jus- 
tification of this practice is the Bernstein-von Mises theorem. Hence, if the 
Bernstein-von Mises theorem does not hold, this practice is not warranted. 
The Bernstein-von Mises theorem is squarely important to Bayesians as 
well, because invalidity of the Bernstein-von Mises theorem often means 
that a Bayesian credible set has zero efficiency relative to the frequentist 
confidence interval. 

In this article we provide a positive result in this direction by showing that 
the Bernstein-von Mises theorem does hold in survival models for a large 
class of prior processes. Indeed, for popular prior processes such as Dirichlet, 
beta and gamma processes, the Bernstein-von Mises theorem holds. The sit- 
uation is subtle, however. In an example provided in Section 4, we also show 
that for any given < a < 1/2, there is a consistent prior process neutral to 
the right that has a posterior convergence rate that is exactly n~ a . This re- 
sult suggests that, for a given model and data, one prior process can be much 
slower extracting information from the data than another. This confirms the 
findings in the literature that posterior consistency does not guarantee the 
optimal convergence rate and in practice a prior must be carefully exam- 
ined before it is used. In the same example, an interesting prior process is 
found. This prior process achieves the optimal posterior convergence rate, 
but its posterior distribution is not equivalent to the sampling distribution 
of the MLE; hence, the Bernstein-von Mises theorem does not hold. This 
example shows that the optimal convergence rate does not guarantee the 
Bernstein-von Mises theorem. 

The Bernstein-von Mises theorem for parametric models is a well-known 
result. See, for instance, Section 7.4.2 of Schervish (1995) and references 
therein. Previous research on the Bernstein-von Mises theorem for non- 
parametric models includes Lo (1983, 1986, 1993), Brunner and Lo (1996), 
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Diaconis and Freedman (1998), Conti (1999) and Freedman (1999). Among 
them, Lo (1983, 1986, 1993), Brunner and Lo (1996) and Conti (1999) re- 
ported some of the earlier positive results on the Bernstein-von Mises the- 
orem for some nonparametric models. See also Ghosal, Ghosh and van der 
Vaart (2000) and Shen and Wasserman (2001) for a related theory of pos- 
terior convergence rates. 

In Section 2 the survival model and prior processes neutral to the right are 
briefly introduced. In Section 3 the main result of this article, the Bernstein- 
von Mises theorem of survival models, is given. In Section 4 a class of prior 
processes with arbitrary posterior convergence rate n _Q , < a < 1/2, and a 
simulation study are given. The proof of the Bernstein-von Mises theorem 
is given in Section 5. 

2. Survival models and processes neutral to the right. Let Xi,...,X n 
be i.i.d. survival times with cumulative distribution function (c.d.f.) F and 
let Ci, . . . , C n be independent censoring times with c.d.f. G, independent of 
the Xj's. Since the observations are subject to right censoring, we observe 
only (Ti, Si), . . . , (T n , S n ), where T, = min(Cj,Xj) and Si = I(Xi < d). Let 
D n = {(Ti, <5i), . . . , (T n ,5 n )}. Let A be the cumulative hazard function (c.h.f.) 
ofF, A(t)=JtdF(s)/(l-F(8-)). 

We say that a prior process on c.d.f. F is a process neutral to the right if 
the corresponding c.h.f. A is a nonstationary subordinator (a positive nonde- 
creasing independent increment process) such that ^4(0) = 0, < AA(t) < 1 
for all t with probability 1 and either AA(t) = 1 for some t > or limt_ +00 A(t) = 
oo with probability 1. See Doksum (1974) for the original definition of pro- 
cesses neutral to the right and see Hjort (1990), Kim (1999) and Kim and 
Lee (2001) for the connection between the definition given here and Dok- 
sum's definition. In what follows, the term subordinator is used for a prior 
process of c.h.f. A which induces a process neutral to the right on F. 

Kim (1999) used the following characterization of subordinators. This 
characterization can be dated back to Levy [see the note in Breiman (1968), 
page 318]. Similar characterization can also be found in Theorem 6.3VIII in 
Daley and Vere-Jones (1988) and Theorem 3 in Fristedt and Gray [(1997), 
page 606]. For any given subordinator A(t) on [0, oo), there exists a unique 
random measure /i on [0, oo) x [0, 1] such that 

(1) A(t) = / x(i(ds,dx). 

J[o,t]x[o,i] 

In fact, fjL is defined by 

M([0,t]xB) = j;/(AA( 8 )€B) 

s<t 
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for any Borel subset B of [0, 1] and for all t > 0. Since \x is a Poisson random 
measure [Jacod and Shiryaev (1987), page 70], there exists a unique cr-finite 
measure v on [0, oo) x [0, 1] such that 

(2) E(/i([0, t] x B)) = u([0, t] x B) 

for all t > 0. Conversely, for a given a-finite measure v such that 

nxu(ds, dx) < oo 

for all t, there exists a unique Poisson random measure [i on [0, oo) x [0, 1] 
which satisfies (2) [Jacod (1979)] and so we can construct a subordinator A 
through (1). Conclusively, we can use v to characterize a subordinator A. 

Suppose that a given subordinator A has fixed discontinuity points at 
t± < ti < • • ■ and that the Levy formula is given by 



E(exp(-0A(t))) 



I E(exp(-0Ay%))) 



*i<t 



Jo 



expl - / (l-e- 9x )dL 4 (x 



where Lt(x) is the Levy measure. Then it can be shown [see Theorem II. 4. 8 
in Jacod and Shiryaev (1987)] that 

u([0,t] xB)= f dL t (x) + J2 I dH ^ x ) 
Jb U<t jB 

for all t > and for any Borel set B of [0,1], where Hi(x) is the distri- 
bution function of AA(ti). When there are no fixed discontinuities, fx is a 
Poisson random measure defined on [0, oo) x [0, 1] with intensity measure v 
and dL t (x) = f^ ^v(ds,dx). Hence, the measure v simply extends dL t by 
incorporating the fixed discontinuity points. However, this simple extension 
provides a convenient notational device. The posterior distribution, which 
typically has many fixed discontinuity points, can be summarized neatly by 
use of the corresponding measure v without separating out the stochastically 
continuous part and the fixed discontinuity points as was done in previous 
work [Ferguson and Phadia (1979) and Hjort (1990)]. For this reason, we 
call v simply the Levy measure of A. 

From the Levy measure z/, we can easily calculate the mean and variance 
of the subordinator using the formulas [Kim (1999)] 

(3) E(A(t))= f f xv{ds,dx) 



o Jo 



and 



Var(A(t)) = /' C x 2 u(ds, dx) - V ( C 
Jo Jo 7<t^° 
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xv({s}, dx) ) . 
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These formulas constitute basic facts for the asymptotic theory of the pos- 
terior and will be used subsequently herein. 

The characterization of subordinators with Levy measures is also con- 
venient in representing the posterior distribution, for the class of processes 
neutral to the right is conjugate with respect to right censored survival data. 
Suppose a priori A is a subordinator with Levy measure 

(4) u(ds,dx) = f s (x) dxds for s > and < x < 1, 

with lim^oo Jq Jq xf s (x) dxds = oo. Then the posterior distribution of A 
given D n is again a subordinator with Levy measure v v given by 

(5) v p (ds,dx) = (l-x) Y ^ s) f s {x)dxds + dH s (x) 1 dN n (s), 

AN n (s) 

where H s (x) is a distribution function on [0, 1] and is defined by 

dH s (x) oc x AMl(s) (l - x) Yn(s) - ANn{s) f s {x) dx 

and N n (t) = Er=i UK < i, * = 1), Y n (t) = Ya=i Ifr > t), AN n {t) = N n (t) - 
N n (t—). Note that the posterior process is the sum of stochastically contin- 
uous and discrete parts, which correspond to the first and the second terms 
in (5), respectively. Note also that H s is the distribution of jump size at s 
if AN n (s) 7^ 0. This fact is used later. For the proof of (5), see Hjort (1990) 
or Kim (1999). 

Let Fq be the true distribution of the Xj's and let Aq be the c.h.f. of 
Fq. We will study the asymptotic behavior of A on a fixed compact interval 
[0,t]. Throughout this article we assume the following two conditions: 

Condition CI. F q (t—) < 1 and G(t—) < 1. 

Condition C2. A is continuous on [0, r). 

Condition CI guarantees that Y n (r) — > oo as n — > oo with probability 1, 
which is essential for the asymptotic theory of survival models. Condition 
C2 implies that AN n (s) has a value of either or 1. 

3. Bernstein-von Mises theorem. Assume that a priori A is a nonsta- 
tionary subordinator with Levy measure 

(6) v([0,t] x B) = f I -g s (x)dx\(s)ds, 

Jo Jb x 

where Jq 1 gt(x) dx = 1 for all t € [0, r] . 

Remark. Comparing (4) and (6), we can see that \(t) = J 1 xf t (x)dx 
and gt(x) =xft(x)/X(t) provided X(t) > 0. 
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We need the following conditions for the Bernstein-von Mises theorem: 

Condition Al. g* = sup te[M>:Ee[0il] (l - x)g t (x) < oo. 

Condition A2. There exists a function q(t) defined on [0, r] such that 
< inf tg j 0>T ] q(t) < sup tG r 0iT ] q(t) < oo and, for some a > and e > 0, 

g t (x) - q(t) 



sup 

te[o,T],xe[o,e] 



< OO. 



Condition A3. A(t) is bounded and positive on (0,r). 

The convergence rate of the posterior distribution depends mainly on the 
behavior of the prior process in the neighborhood of 0. This is because the 
jump sizes of the posterior process get smaller as n gets larger. Condition 
Al is a technical one to make the posterior mass of the jump sizes of the 
fixed discontinuity points outside the neighborhood of be asymptotically 
negligible. Condition A2 is the main condition, in which a measures the 
smoothness of gt{x) in x around 0. The constant a plays a crucial role in 
determining the convergence rate of the posterior distribution. In fact, the 
Bernstein-von Mises theorem may not hold if a < 1/2. For an example, see 
Section 4. The boundedness of A in Condition A3 makes the posterior dis- 
tribution eventually be dominated by data. The positiveness of A in Condi- 
tion A3 is also necessary. Suppose X(t) = for t S [c, d] , where < c < d < r. 
Then both the prior and posterior put mass 1 to the set of c.h.f.s A, with 
A{d) = A(c). Hence the posterior distribution of A(d) — A(c) has mass 1 at 
and the Bernstein-von Mises theorem does not hold unless Aq{6) = Aq(c). 

Before stating the theorems, we introduce some notation. For a given ran- 
dom variable Z n , we write Z n = 0(n s ) with probability 1 if there exists a con- 
stant M > such that \Z n \/n s < M for all but finitely many n with probabil- 
ity 1. Let 5 a be the degenerate probability measure at a. Denote by £{X\Y) 
the conditional distribution of X given Y. Let W be a standard Brown- 
ian motion and let A n be the Aalen-Nelson estimator defined by A n (t) = 
Jq dN n (s)/Y n (s). The sampling distribution of yfn{A n — Aq) converges in dis- 
tribution to W(U (-)), where U (t) = J*dA (s)/Q(s), with Q(t) = Pr(Ti > t) 
[see Theorem IV. 1.2 in Andersen, Borgan, Gill and Keiding (1993)]. Here Uq 
is well defined, because inf te r ^ Q(t) > due to Condition CI. 

The following theorem is a general result on the convergence of the poste- 
rior distribution. The Bernstein-von Mises theorem and an example of sub- 
optimal convergent rates in Section 4 will be based on this theorem. Let q n be 
the number of distinct uncensored observations and let t\ < t<i < ■ ■ ■ < t Qn be 
the distinct uncensored observations. Let Ad(t) = 5^f=i Avl(ii). Let -D[0,r] 
be the space of cadlag functions on [0, r] equipped with the uniform topology 
and the ball cx-field. 
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Theorem 1. Under Conditions A1-A3: 

(i) C(y/n(A(-) - A d (-))\B n ) S 5 on D[0,r] with probability 1; 

(ii) C(^E(A d (-)-E(A d (-)\D n ))\D n )^W(U (-)) on D[0,t] with proba- 
bility 1; 

(iii) SUPi e[0iT] \E(A d (t)\D n ) - A n (t)\ = 0( n -min{l,a}) ^ pwbabi l ity 1. 

The proof is given in Section 5. 

Part (i) of Theorem 1 states that the stochastically continuous part of 
the posterior process, A — A d , vanishes with a rate faster than the opti- 
mal rate, n" 1 / 2 . Part (ii) states that the fixed discontinuous part of the 
posterior process, A d , centered at its mean is asymptotically equivalent to 
the frequentist sampling distribution of A n since W(Uo(t)) in Theorem 1(h) 
is the limiting sampling distribution of y/n(A n (t) — ^4o(*))- Part (iii) states 
that the difference of the posterior mean of A d and A n vanishes with varying 
order, n - mm { l M ^ f or a > 0. Hence, if a < 1/2, the overall convergence rate 
of the posterior distribution could be dominated by the convergence rate 
of (iii), which results in suboptimal convergence rates. Indeed, in Section 4 
such an example is given. 

Although a rigorous proof of Theorem 1 is given in Section 5, we sketch 
the proof here. For (i), we first approximate the first two moments of the 
posterior distribution of A by those of the posterior with a beta process 
prior (see Example 1 for a definition of beta process) . Since the closed forms 
of the first two moments of the posterior with the beta process prior are 
known [Hjort (1990)], one can easily prove (i) using Lemma 7. Part (iii) 
is proved similarly. For (ii), the posterior distribution of A d consists of the 
sum of independent random variables, and so the central limit theorem for 
independent random variables [e.g., Theorem 19 in Section V.4 in Pollard 
(1984)] can be applied. 

Theorem 2 (Bernstein-von Mises theorem). Under Conditions A1-A3 
with a > 1/2, 

C(^(A(.)-A n (.))\B n )Sw(U (-)) 
on D[0,t] with probability 1. 

Proof. This theorem is an immediate consequence of Theorem 1, be- 
cause we can decompose 



n V\A{t) - A{t)) = n l '\A{t) - A d {t)) + n^iAS) - E(A d (t)\B n )) 

+ n 1 / 2 (E(A d (t)\B n )-A n (t)). □ 
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Corollary 1. Under the same conditions in Theorem 2, 
£(vHS(0 - S n {-))\D n ) S -S (.)W(U (-)) 

on D[0,t] with probability 1, where S,S n and So are the corresponding sur- 
vival functions of A,A n and Aq. 

Proof. Note that the survival function is recovered from the cumulative 
hazard function by the product integration operator which is Hadamard 
differentiable. The result follows from the functional delta method. See Gill 
(1989). □ 

Remark. If Conditions A1-A3 as well as Condition CI hold for all 
r > 0, Theorem 2 and Corollary 1 are valid on D[0, oo), because the weak 
convergence on D[0,oo) is defined by the weak convergence on D[0,t] for 
all r>0 [Pollard (1984)]. 

A convenient sufficient condition for Condition A2 with a = 1 can be 
given as follows. Suppose that for some e > 

(7) sup |^ (1) (x)| < oo, 

te[0,r],xe(0,e) 

where l (x) is the first derivative of gt(x) in x on [0, 1]. Then, by the mean 
value theorem, Condition A2 holds with a = l and q(t) = gt(0). 

In the next three examples, we illustrate that the Bernstein-von Mises 
theorem holds for beta, Dirichlet and gamma prior processes. 

Example 1 (Beta processes). The beta process with mean A and scale 
parameter c is a nonstationary subordinator with Levy measure v, i>(dt, dx) = 
c(i)x -1 (l — x) c (*) _1 dxdA(t). Suppose A(t) = J A(s) ds, where \(t) is positive 
continuous on (0, r) and < inf tg [ 0jT ] c(t)(= c*) < sup tg r 0iT ] c(t)(= c*) < cxd. 
Condition Al is true because 

sup \(1 - x)g t (x)\ = sup \c(t)(l -x) c(t) | < c* < oo. 
t€[0,T],x6[0,l] te[0,r],xe[0,l] 

For Condition A2, since g^(x) = c(t)(c(t) — 1)(1 — x) c ^~ 2 , we have 

sup \gt 1] (x)\ <c*(c* + l)max{l,(l-e) c *- 2 }. 

te[o,T],xe(o,e) 

Thus, by (7), Condition A2 holds with q(t) = c(t). Since Condition A3 is 
assumed, the Bernstein-von Mises theorem holds. 
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Example 2 (Dirichlet processes). Hjort (1990) showed that when the 
prior of the distribution F is the Dirichlet process with base measure a, the 
induced prior of the c.h.f. is the beta process with c(t) = a([0, oo))(l — H(t)) 
and A(t), the c.h.f. of H(t), where H(t) = a([0,t])/a([0, oo)). Suppose A(t) = 
J t X(s)ds. Then if X(t) is positive bounded on (0,r) and H(t) < 1, then, as 
in Example 1, it can be shown that Conditions A1-A3 are satisfied. 

Example 3 (Gamma processes). A priori, assume that Y(t) = — log(l — 
F(t)) is a gamma process with parameters (H(t),d(t)) with H(t) = f h(s) dx. 
Here the gamma process with parameters (H(t),d(t)) is defined by Y(t) = 
So d[s) dX(s), where X(t) is a subordinator that has a marginal distribution 

of X(t) that is a gamma distribution with parameters (Jq* d(s) dH(s), 1). 
See Lo (1982) for details. This prior process was used by Doksum (1974), 
Kalbfleisch (1978) and Ferguson and Phadia (1979) . Since 

logE(exp(-0Y(t))) = / /°°(e~ te - l)^-exp(-d(s)x)dxdH(s), 
Jo Jo x 

it can be shown that the c.h.f. A of F is a subordinator with Levy measure 
v given by 

rt 



where 



and 



v([0,t]xB)= [ c(s)[ —4 ril-xf^dxdAfi 

Jo Jb - log(l - x) 



Therefore, we have 



9t(x) = c(t)—^- ,(1 - x)^*)- 1 , < x < 1, 

— log(l — X) 

and X(t) = d(t)h(t)/c(t). 

Suppose h(t) is positive and bounded on t E (0, r) and < inf ie [ 0jr ] d(t)(= 
d*) < sup ie [ )T j d(t)(= d*) < oo. We will show that Conditions A1-A3 hold 
under these conditions. First, we show that < inf tg [ 0jT ] c(t)(= c*) < sup tg [ 0jT ] c 
c*) < oo. Note that 



o -log(l-x) / \d*/2 
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where m = sup^grg^] —x(l — x) d */ 2 /log(l — x). By a similar argument, we 
can show that sup tg [ ,r] c (0 < 00 ■ Now, Condition Al follows because 



sup |(1 -x)5t(a;)| = sup 

iG[0,T],xe[0,l] te[0,T],x£[0,l] 



X 



< sup 

te[0,r},x£[0,l] 



< c* sup 
xe[o,i] 



c(ty 

c ^"-log(l-x) 
x(l — x) d * 



log(l — X 

X 



.(\- x )d{t) 



1-x 



log(l — x) 



< oo. 



Similarly, Condition A2 can be shown by (7) with q(t) = c(t) and Condition 
A3 follows from inf tg r 0|T i c(i) > 0. 

4. An example: suboptimal convergence rates. In this section, we show 
that, for a given convergence rate n~ a with < a < 1/2, there exists a 
prior process neutral to the right whose posterior convergence rate is n~ a . 
Consider the class of prior processes neutral to the right with Levy measure 



(8) 



1 



v a (dt,dx) = -(1 +x a )dxdt, xG (0,l],t>0. 



In the next theorem we show that, for each < a < 1/2, the posterior with 
the prior process v a achieves convergence rate n~ a . 

Theorem 3. A priori let A be a subordinator with Levy measure v a in 
(8). Then: 

(i) For < a < 1/2, C(n a (A(-) - A n (-))\B n ) 4 Sj^ on D[0,r] with 
probability 1, where J a (t) = aT(a + 1) f dAo(s)/Q a (s). 

(ii) For a = 1/2, C(n l / 2 (A(-)-A n (-))\B n )^W(U (-)) + J 1/2 (-) on D[0,r] 
with probability 1. 

(iii) For a > 1/2, £(n 1 / 2 (A(-) - i n (-))|D n ) 4 W(U (-)) on D[0,t] with 
probability 1. 

Remark 1. When < a < 1/2, the posterior convergence rate is n~ a , 
which is slower than the optimal rate n -1 / 2 . 

Remark 2. When a = 1/2, the posterior convergence rate is optimal, 
but the limiting posterior distribution is the limiting sampling distribution of 
the Aalen-Nelson estimator plus a bias term J\/2- So the Bayesian credible 
set does not have appropriate frequentist coverage probability, although it 
has the optimal posterior convergence rate. 
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Remark 3. The Bernstein-von Mises theorem holds when a > 1/2. Al- 
though we do not know whether Conditions A1-A3 are necessary and suf- 
ficient conditions for the Bernstein-von Mises theorem, this example shows 
that these conditions are fairly minimal. 

To prove Theorem 3 we need the following lemma, the proof of which can 
be found in Appendix A. 

Lemma 1. For < a < 1/2, 

sup \n a (E{A d (t)\B n ) - An(t)) -J a (t)\^0 

te[o,r] 

with probability 1. 

Proof of Theorem 3. It is easy to see that u a in (8) satisfies Condi- 
tions A1-A3 with q(t) = (a + 1) /(a + 2) and X(t) = 1. Now note that 

n a (A(t) - A n (t)) = n a (A(t) - A d (t)) + n a (A d (t) - E(A d (t)\D n )) 
+ n a (E(A d (t)\D n )-A n (t)). 

The first term of the right-hand side converges weakly to for all a > 0; 
the second term converges weakly to for a < 1/2 and converges weakly 
to W(Uq(-)) for a > 1/2 by Theorem 1. Finally, the third term converges 



O d 
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Fig. I. Empirical coverage probabilities of the Bayesian credible set of A(2), the c.h.f. 
att = 2, with nominal level 90%. Empirical coverage probabilities are based on 1000 data 
sets for each of sample sizes n = 10, 50, 100, 500, 1000, 2000, 5000 with the prior (8) at 
a = 0.25,0.5,l,5. The three solid lines represent the nominal level and 2 standard errors 
away from it. The dotted lines are the empirical coverage probabilities. 
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weakly to J a for a < 1/2 and converges weakly to for a > 1/2 by Lemma 
1. Hence, the proof is complete by Slutsky's theorem. □ 

Theorem 3 shows that the posterior with the prior (8) with a > 1/2 
can be used to construct an asymptotically valid frequentist confidence in- 
terval, while the posterior with a < 1/2 cannot. A simulation study was 
conducted to see the effect of a and sample size n on empirical cover- 
age probability. Right censored data were generated from Exponential(l) 
for the survival time and Exponential(0.25) for the right-censoring times, 
which amounts to censoring probability 0.2. For each of seven sample sizes 
n = 10, 50, 100, 500, 1000, 2000, 5000, 1000 data sets were generated. The pos- 
terior distribution was computed, based on an algorithm modified from Lee 
and Kim (2004), for each data set with the prior (8) for a = 0.25,0.5, 1,5. 
The empirical coverage probability is the proportion of the data sets that 
have credible sets of A(2), the c.h.f. at t = 2, that contain the true value 
^4(2) = 2. The simulation result is reported in Figure 1. The three solid 
lines represent the nominal coverage probability 0.9 and 2 standard er- 
rors 2^/0.9 ■ 0.1/1000 = 0.01897 away from it. The coverage probability with 
a = 0.25 gets worse as the sample size grows. When a = 0.5, the coverage 
probability shows a difference from the nominal level which does not get 
smaller as the sample size increases. However, with a = 1 and 5, the cov- 
erage probability is inside the error bounds from n = 100 on. All of these 
agree with Theorem 3. 

5. Proof of Theorem 1. Throughout this section, the statements of The- 
orem 1 are assumed. Let B(a, b) = Jq x a ~ 1 (l — x) 6 " 1 dx. Then Stirling's for- 
mula yields that, for a > 0, 



Lemma 2. Let W n be a sequence of nonnegative stochastic processes 
on [0, r] such that 



(9) 



lim n a B(a,n)=T(a). 



(10) 



sup \W n (t)/n 
te[o,r] 



Q(*)|-0 



with probability 1. Then 



sup \n k B(k, W n {t)) - T(k)/Q k (t)\ -> 



*G[0,r] 

with probability 1 as n — > oo, for every integer k>l. 



Proof. We can write 
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Since (10) implies inf te r 0)r ] W n (t) — ► oo with probability 1, (9) yields 
sup \W%(t)B(k,} 

te[0,r] 

with probability 1. Also (10) implies 



sup \W k {t)B{k,W n (t))-T{k)\^0 

te[0,r] 



sup 

te[o,T] 



n 







with probability 1, which completes the proof. □ 

Let Y+(t) = Y n (t) - AN n (t) and C k (t) = ft x k (l - x) Y -^g t (x) dx. 

Lemma 3. For k>0, 
(11) 



sup \C k {t) - q(t)B(k + l,Y+(t) + 1)| = 0(n-( fc+1+Q )) 
te[o,r] 



and 
(12) 



sup \n k+1 C k {t) - q(t)T(k + l)Q^ k+1 \t)\ - 
te[o,r] 



with probability 1. 

Proof. For (11), let pt(a;) = (1 - x)(g t {x) - q(t))/x a . Then Conditions 
Al and A2 together imply supjgro^i^gfo,!] |pt(a;)l( = P*) < oo. Now 

|C fc (t)-^)£(fc + l,Y+(t) + l)| 

" x k+a {l-x) Y ^^- l Vt {x)dx 



<p*B(k + a + l,Y+(t)). 
Since sup tg r 0iT | |Yj+(t)/n — Q(t) \ — > with probability 1, Lemma 2 yields 

sup \C k (t) - q(t)B(k + 1, Y+(t) + 1)| = 0(n-( fc+1+Q )) 

*6[0,t] 

with probability 1. Equation (12) is an easy consequence of (11) and Lemma 
2. 

□ 



(13) 

and 
(14) 



Lemma 4. We have 

C k (t) B(k + l,Y+(t) + l) 



sup 

te[o,r 



c (t) 5(i,r„+(t) + i) 



0(n 



-(fc+a)> 



sup E((AA d (ti)) k \r) n ) = 0(n- k ). 

i=l,...,q n 
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Proof. For (13), we can write 
C fc (t) B(k + l,Y+(t) + l) 



(15) 



C (t) B(l,Y+(t) + l) 

C k (t)-q(t)B(k + l,Y+(t) + l) 



< 



(16) + 

Since (12) yields 
(17) 

(11) implies 



C (t) 

B(k + i,y+(t) + i)(c (t) - + 1)) 



c (t)5(i,y+(t) + i) 



inf nC (i)^ inf > 0, 

ie[0,r] i6[0,r] 



n fc+a sup 

t€[0,r 



C fc (t)- 5(t)B(fc + l, Y+(t) + l) 



C (t) 



< 



n k +a+ i sup \ Ck {t) - q(t)B(k + i,y+(t) + 1) 



inf te[0,r] nC (t) 



O(l) 



with probability 1, and hence (15) is 0(n ( fc + Q )). On the other hand, since 
(11) yields 

inf nB(l, Y+(t) + 1) - inf Q" 1 ^) > 0, 

t€[0,r] *6[0,r] 

(11) together with Lemma 2 and (17) implies 

B(k + 1, y+(*) + i)(c (i) - g (t)5(i,y+(t) + 1)) 



n fc+a sup 
te[0,r 



C (t)S(l,y+(t) + l) 



< 



sup^ln^fc + l^+^ + l)! 
hift G [o,r] nC {t) 
sup, e[0iT] K+«(C (i) - ?(t)S(l, y+(t) + 1))| 



inf t6[0>r] nS(l,y+(t) + l) 



O(l) 



with probability 1, and so (16) is 0(n ( fc+Q )) ) which completes the proof of 
(13). 

For (14), note that the distribution function of AAd(U) is iJ^(x). Hence 
E((AA d (ti)) k \B n ) =C k (ti)/C (ti), which together with (14) completes the 
proof. 

□ 
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Proof of Theorem l(i). Since a posteriori A — A d is a Levy process 
with Levy measure v c given by v c (dt : dx) = x~ l {\ — x) Yn ^' gt{x) dx X(t) dt, 
Condition Al and Lemma 2 with (3) imply 

E(A(t) - A d {t)\B n ) < g* [ T X(s) dsB(l,Y n (r)) = 0(n~ l ) 

Jo 

with probability 1. Similarly, 

V(A(t) - A d (t)\D n ) < g* f X(s) dsB(2, Y n ( T )) = 0{n~ 2 ) 



o 

with probability 1. Hence the proof is completed by Lemma 7 (in Appendix 
B). 

□ 

Proof of Theorem 1(h). Let Z n {t) = y/n(A d (t)-E(A d (t)\B n )). Since 
Z n is a Levy process, we can utilize Theorem 19 in Section V.4 in Pollard 
(1984). We first prove the convergence of finite dimensional distributions by 
showing Lyapounov's condition. Suppose < s < t < r are given. Note that 

Z n (t)-Z n (s)= v / ^(A^(t J )-E(A^(t J )|D n )). 

s<u<t 



Let 



4 

v n . 



sup E ( y/n(AA d (ti) — E(AA d (U) |D„)) j |D„ 

i=l,...,q n lK ' 

Then (14) in Lemma 4 implies V n = 0(n~ 2 ) with probability 1. Because 
su Pte[o,r] So dN n (u) =0(n), 

J2 E[(y^{AA d {t t ) -E(AA d (^)|D„))) 4 |D, 



(18) 



< I V n dN n (u)^0 



with probability 1. 

On the other hand, let 



W ni = E((A^(i i )) 2 |D n ) - (E(AA d (ti)|D n )) 2 

B(3,y+(ti) + l) (B(2,Y+(U) + 1 



B{1,Y+(U) + 1) V J B(l,Fn + (^) + l) 

Lemma 2 together with (12) in Lemma 3 and (13) in Lemma 4 yields 
sup i=lv . _ j9n \W ni \ = 0(n~ 2 ~ a ). Hence 

Var(Z n (t) - Z n {s)\D n ) 
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Y, n[E((AA^)) 2 |D„) - (E(AA*(i;)|D n )) : 

s<U<t 

y n 2 (n)(y+(u) + l) ]dN n (u) 



Since 



and 



n 



sup 

nG[0,r] 



(K+(u) + 2) 2 (Y+(u) + 3) 

y w 2 ( M )(y+H + i) 
(r+(«) + 2)2(y+( u ) + 3) 

Q(n)" 1 



(«) 



s<t;<t 



sup 

ue[o,r 



with probability 1, we have by Lemma 6, 



n 



Y n {u) 



Y*(u)(Y+(u) + l) 



(F+(u) + 2) 2 (Y+(u)+3) 



dN n (u) 
Y n (u) 



U (t) - U (s) 



uniformly in s and t with probability 1. Since 



E n\W m \<n 2 _sup |W ni |=0(n- a ), 

s<ti<t i—l,...,q n 



we obtain 



(19) sup |Vai(Z n (t) - Z n (s)\D n ) - (U (t) - U (s))\ -> 

s,te[0,r] 

with probability 1. Now (18) and (19) imply the convergence of the finite 
dimensional posterior distributions of Z n to those of W{Uq) with probabil- 
ity 1. 

Finally, note that 

?r{\Z n {t) - Z n (s)\ > e|D n } < 1 Var(Z„(t) - Z n {s)\D n ). 

By (19), we have 

Var(Z„(t) - Z n (a)|D n ) = U (t) - U (s) + o(l) 

with probability 1. Since Uo(t) is continuous, with probability 1 we can make 
Pv{\Z n (t) — Z n (s)\ > e|D n } as small as possible for sufficiently large n by 
choosing t and s sufficiently close. Hence by Theorem 19 in Section V.4 in 
Pollard (1984) we conclude that Z n given D n converges weakly to W(U$) 
on D[0,t] with probability 1. □ 
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Proof of Theorem l(iii). Let W ni = E(AA d (t i )\D n )-l/Y+(t i ). Then 



Lemma 4 yields supj =1 \W ni \=0{n- 1 - a ). Since 



E(A d (t)\D n ) 



Y n {s) dN n {s) 
o Y+{s) Y n (s) 



we have 



(20) sup \E(A d (t)\D n )-A n (t)\< 
te[o,r] 



Y n (s) 



Y+(s) 



dN n (s) 



Y n (s) 



+ 0(n" Q ). 



Since the first term on the right-hand side of (20) is 0(n 1 ) by Lemma 6, 
the proof is done. □ 



APPENDIX A 



Proving Lemma 1. Let 



B a (s)= / x a (l-x) Y " {s) dx 



r(a + i)r(r+( a ) + i) 
r(y+( s ) + a + 2) 



Then Lemma 2 yields that 
(21) 



sup |(y n + (.) + ir +1 B Q ( s )hr(a + i) 

sS[0,r] 



and 
(22) 



sup \(Y+(s) + l)B a (s)\=0(n 

se[0,r] 



with probability 1 for a > 0. 

Lemma 5. Wii/i probability 1, we have: 
(i) 

r„(s)5i(a) 



(23) 



sup 

sS[0,r 



B (s) + B a (s) 



1 



0(n -min{l,a}). 



(ii) if a < 1/2, 



(24) 

(iii) 
(25) 

for a > 0. 



sup 

sG[0,t 



sup 

sG[0,t] 



y n (g)^i(g) 

B (s) + B a (s) 



n a Y n (s)B a+1 (s) 



1 +r(a + l)Q(s)- Q 



0: 



B (s) + B a {s) 



F(a + 2)Q(s)- a 
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Proof. For (23), (22) yields 
Y n (s)B 1 (s) 



B (s) + B a (s) 



1 



Y n (s)-Y+(s)-2 



(Y+(s) + l)B a (s) 



< 



(Y+(s) + 2)(1 + {Y+(s) + l)B a (s)) 1 + {Y+(s) + l)B a (s) 
-^- + sup \(Y+(s) + l)B a (s)\ 

Y n\j) se[0,T] 

0(n^) + 0(n- a ) 

( n -min{l,a}) 



with probability 1. 
For (24), note that 



n 



1 +T(a + l)Q(s) 



(26) 
(27) 



^(4gi(g) 

S (s) + S Q (s) 

n a (Y n (s)-Y+(s)-2) 
(Y+(s) + 2)(1 + (y„ + (s) + l)fi Q (a)) 
n«(F+( S ) + l)B a ( S s 



l + (F+(s) + l)S Q ( S ) 



r(a + l)Q(«) 



Since a < 1/2, sup sg[0jT] | (26)| < 2n a /Y n ( T ) -► with probability 1. For (26), 
let p(s) = T(a + l)Q(s)~ a . Then 



Here 



|(27)| < K(Y+( S ) + l)B a (s) -p(s)\ + \p(s)(Y+(s) + l)B a («)|. 

\n a (Y+(s) + l)B a (s)-p(s)\ 

(Y+(s) + l)^T(Y+(s) + l) 



<r(o + i)- 



n 



.Yrt(s) + 1 

r(a + l)Q(s)- Q 



r(y+( s ) + a + 2) 
a -Q(s)- 

(Y+( s ) + ir+'r(Y+(s) + i) 



r(y+( s ) + a + 2) 



Since sup tg [ 0)T ] Y+(t) = 0(n), we conclude sup sg [ ,r] l na (^n~( s ) + 1)-B a (s) — 
jp(s)| ^0 with probability 1 by (9). Also we have sup sg [ 0)T ] \p(s)(Y+(s) + 
l)B a (s) \ — ► with probability 1 by (22) and the proof is done. 
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n a Y n (s)B a+1 (s) 
B (s)+B a (s) 

(n/Y n (s)Y 



Y n {s) 



1 + (Y+(s) + l)B a (s) \Y+(s) + 1 
uniformly in s S [0,r] with probability 1. □ 



l+a 



(Y+(s) + l) 2+a B a+1 ( 



Proof of Lemma 1. Note that 

ft 1 rl 



E{A d (t)\B n ) 



k B (s) + B a (s)J 
Hence, we have 

E(A d (t)\D n ) - A n (t) 

r* B^ + Ba+tis 



x(l - x) y " +(s) (l + x a ) dx dN n (s). 



B (s) + B a (s) 
Y n (s)B 1 (s) 



dN n (s)-A n (t) 
] dNn(s) | (* 



l \B (s) + B a (s) 7 Y n {s) Jo B (s) + B a (s) Y n (s) 
For < a < 1/2, (24) in Lemma 5 and Lemma 6 yield 



(28) 



sup 

te[o,r 



n 



B (s) + B a (s) 



1 



dN n (s) 
Y n (s) 



+ / T(a + l)Q(s)- a dA (i 
Jo 







with probability 1, and (25) in Lemma 5 and Lemma 6 imply 
(29) 



sup 

te[o,T 



* nCe Y n (s)B a+1 (s) dN n (s) 



,o B (s) + B a (s) Y n {s) 

[ T(a + 2)Q( S y a dA (s 
Jo 

with probability 1. Combining (28) and (29), we have 







sup 

te[o,r 



n a (E(A d (t)\B n ) 



A n (t))- / (T(a + 2)-T(a + l))Q( S y a dA (s) 
Jo 



with probability 1. 
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For a > 1/2, (23) in Lemma 5 and Lemma 6 yield 

dN n {s) 



(30) 



sup 

te[o,-r] 



n 



l/2 ( Y n (s)B 1 (s) 

B (s) + B a (s) J Y n (s) 



with probability 1, and (25) in Lemma 5 and Lemma 6 imply 



(31) 



sup 

te[o,r 



* 1/2 Y n (s)B a+ i(s) dN n (s) 



B (s) + B a (s) Y n (s] 
with probability 1. Combining (30) and (31), we have 



sup In^ECAKtJlDn) - A n (t))\ - 

te[0,r] 



with probability 1. □ 



Technical lemmas. 



APPENDIX B 



Lemma 6. Let Xi(t),X2(t), . . . be stochastic processes defined on [0, r]. 
Suppose that there exists a continuous function X(t) defined on [0,r] such 
that 

lim sup \X n (t)-X(t)\ =0 



te[0,r] 



with probability 1. Then 

c1 



sup 

te[o,T 

with probability 1 . 



1 



Y n [s) 



dN n (s)- / X(a)«L4o(fl; 
o 



Proof. This lemma is an easy consequence of the Glivenko-Cantelli 
theorem and Lemma A. 2 in Tsiatis (1981). □ 

Lemma 7. Let X n be a sequence of subordinators such that E(X n (t)) — > 
Xo(t) and Var(X n (t)) — > for some continuous function Xo(t) and all t G 

[0,r]. Then C(X n ) S 5 Xo on D[0,t]. 



Proof. Note that Xq should be a monotonically increasing function 
since X n are subordinators. Hence, the continuity of Xq together with the 
assumptions implies that sup tG r 0>r i \X n (t) — Xo(t) \ — > in probability. □ 
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